HTTP

Import flat files from the web

The utils functions to import flat file data, such as read.csv() and read.delim(), are capable of automatically importing from URLs that point to flat files on the web.

You must be wondering whether Hadley Wickham’s alternative package, readr, is equally potent. Well, figure it out in this exercise! The URLs for both a .csv file as well as a .delim file are already coded for you. It’s up to you to actually import the data. If it works, that is…

# Load the readr package
library(readr)

# Import the csv file: pools
url_csv <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/swimming_pools.csv"
pools <- read_csv(url_csv)
## Parsed with column specification:
## cols(
##   Name = col_character(),
##   Address = col_character(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )
# Import the txt file: potatoes
url_delim <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/potatoes.txt"
potatoes <- read_tsv(url_delim)
## Parsed with column specification:
## cols(
##   area = col_double(),
##   temp = col_double(),
##   size = col_double(),
##   storage = col_double(),
##   method = col_double(),
##   texture = col_double(),
##   flavor = col_double(),
##   moistness = col_double()
## )
# Print pools and potatoes
pools %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689")
Name Address Latitude Longitude
Acacia Ridge Leisure Centre 1391 Beaudesert Road, Acacia Ridge -27.58616 153.0264
Bellbowrie Pool Sugarwood Street, Bellbowrie -27.56547 152.8911
Carole Park Cnr Boundary Road and Waterford Road Wacol -27.60744 152.9315
Centenary Pool (inner City) 400 Gregory Terrace, Spring Hill -27.45537 153.0251
Chermside Pool 375 Hamilton Road, Chermside -27.38583 153.0351
Colmslie Pool (Morningside) 400 Lytton Road, Morningside -27.45516 153.0789
potatoes %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689")
area temp size storage method texture flavor moistness
1 1 1 1 1 2.9 3.2 3.0
1 1 1 1 2 2.3 2.5 2.6
1 1 1 1 3 2.5 2.8 2.8
1 1 1 1 4 2.1 2.9 2.4
1 1 1 1 5 1.9 2.8 2.2
1 1 1 2 1 1.8 3.0 1.7

Secure importing

In the previous exercises, you have been working with URLs that all start with http://. There is, however, a safer alternative to HTTP, namely HTTPS, which stands for HypterText Transfer Protocol Secure. Just remember this: HTTPS is relatively safe, HTTP is not.

Luckily for us, you can use the standard importing functions with https:// connections since R version 3.2.2.

# https URL to the swimming_pools csv file.
url_csv <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/swimming_pools.csv"

# Import the file using read.csv(): pools1
pools1 <- read.csv(url_csv)

# Load the readr package
library(readr)

# Import the file using read_csv(): pools2
pools2 <- read_csv(url_csv)
## Parsed with column specification:
## cols(
##   Name = col_character(),
##   Address = col_character(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )
# Print the structure of pools1 and pools2
str(pools1)
## 'data.frame':    20 obs. of  4 variables:
##  $ Name     : Factor w/ 20 levels "Acacia Ridge Leisure Centre",..: 1 2 3 4 5 6 19 7 8 9 ...
##  $ Address  : Factor w/ 20 levels "1 Fairlead Crescent, Manly",..: 5 20 18 10 9 11 6 15 12 17 ...
##  $ Latitude : num  -27.6 -27.6 -27.6 -27.5 -27.4 ...
##  $ Longitude: num  153 153 153 153 153 ...
str(pools2)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 20 obs. of  4 variables:
##  $ Name     : chr  "Acacia Ridge Leisure Centre" "Bellbowrie Pool" "Carole Park" "Centenary Pool (inner City)" ...
##  $ Address  : chr  "1391 Beaudesert Road, Acacia Ridge" "Sugarwood Street, Bellbowrie" "Cnr Boundary Road and Waterford Road Wacol" "400 Gregory Terrace, Spring Hill" ...
##  $ Latitude : num  -27.6 -27.6 -27.6 -27.5 -27.4 ...
##  $ Longitude: num  153 153 153 153 153 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Name = col_character(),
##   ..   Address = col_character(),
##   ..   Latitude = col_double(),
##   ..   Longitude = col_double()
##   .. )

Import Excel files from the web

When you learned about gdata, it was already mentioned that gdata can handle .xls files that are on the internet. readxl can’t, at least not yet. The URL with which you’ll be working is already available in the sample code. You will import it once using gdata and once with the readxl package via a workaround.

# Load gdata package - Does not owrk as Perl Strawberry not installed
library(gdata)

# Specification of url: url_xls
url_xls <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls"

# Import the .xls file with gdata: excel_gdata
excel_gdata <- read.xls(url_xls, perl = "C:/myperl/perl/bin/perl.exe")

excel_gdata %>%
  kable(caption = "Latitude from read.xls") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
  scroll_box(width = "100%", height = "300px")
Latitude from read.xls
country X1700
Afghanistan 34.5650000
Akrotiri and Dhekelia 34.6166667
Albania 41.3120000
Algeria 36.7200000
American Samoa -14.3070000
Andorra 42.5460000
Angola -8.8430000
Anguilla 18.2500000
Antigua and Barbuda 17.0720000
Argentina -36.6760000
Armenia 40.2540000
Aruba 12.5130000
Australia -32.2190000
Austria 48.2310000
Azerbaijan 40.3520000
Bahamas 24.7000000
Bahrain 26.0240000
Bangladesh 23.8800000
Barbados 13.1790000
Belarus 53.5470900
Belgium 50.8370000
Belize 17.8430000
Benin 6.3640000
Bermuda 32.2170000
Bhutan 27.4790000
Bolivia -15.1900000
Bosnia and Herzegovina 44.1750100
Botswana -21.5360000
Brazil -19.5570000
British Virgin Islands 18.5000000
Brunei 4.5010000
Bulgaria 42.0730000
Burkina Faso 12.0490000
Burundi -3.3650000
Cambodia 12.0260000
Cameroon 10.7300000
Canada 43.7270000
Cape Verde 15.0910000
Cayman Islands 19.3190000
Central African Rep. 4.3310000
Chad 10.3770000
Channel Islands 49.2170000
Chile -33.5540000
China 29.5610000
Christmas Island -10.5000000
Cocos Island -12.5000000
Colombia 4.7880000
Comoros -11.6710000
Congo, Dem. Rep. -2.9202760
Congo, Rep. -3.6840000
Cook Islands -21.2333333
Costa Rica 9.9410000
Cote d’Ivoire 5.4960000
Croatia 45.1100800
Cuba 23.0840000
Cyprus 35.0810000
Czech Rep. 49.7792900
Denmark 55.7180000
Djibouti 11.5050000
Dominica 15.4330000
Dominican Rep. 18.5610000
East Germany NA
Ecuador -2.0620000
Egypt 29.9960000
El Salvador 13.7750000
Equatorial Guinea 2.3260000
Eritrea 15.3120000
Estonia 58.6850000
Ethiopia 9.0070000
Faeroe Islands 62.0000000
Falkland Islands (Malvinas) -51.7500000
Fiji -17.8270000
Finland 60.2120000
France 48.8570000
French Guiana 3.9880000
French Polynesia -17.6660000
Gabon 0.3720000
Gambia 13.2570000
Georgia 42.0340000
Germany 48.1610000
Ghana 6.6940000
Gibraltar 36.1333333
Greece 38.0580000
Greenland 64.2170000
Grenada 12.1150000
Guadeloupe 16.1630000
Guam 13.4400000
Guatemala 14.6220000
Guernsey 49.4666667
Guinea 11.6710000
Guinea-Bissau 12.2620000
Guyana 5.7610000
Haiti 18.9320000
Holy See 41.9000000
Honduras 14.1940000
Hong Kong, China 22.7040000
Hungary 47.4190000
Iceland 63.8920000
India 25.2740000
Indonesia -6.5620000
Iran 35.3800000
Iraq 33.3140000
Ireland 54.6110000
Isle of Man 54.2250000
Israel 32.0840000
Italy 45.4150000
Jamaica 18.0550000
Japan 35.7120000
Jersey 49.2500000
Jordan 31.6020000
Kazakhstan 44.3080000
Kenya -0.5130000
Kiribati 1.8470000
Korea, Dem. Rep. 39.5270000
Korea, Rep. 37.5530000
Kosovo 42.5833333
Kuwait 29.3260000
Kyrgyzstan 41.4894000
Laos 16.5470000
Latvia 56.8580000
Lebanon 34.1110000
Lesotho -29.5950000
Liberia 6.3850000
Libya 32.6070000
Liechtenstein 47.1410000
Lithuania 55.3180000
Luxembourg 49.7800000
Macao, China 22.5060000
Macedonia, FYR 41.5738100
Madagascar -18.9580000
Malawi -15.8110000
Malaysia 3.2690000
Maldives 3.2500000
Mali 12.5080000
Malta 35.8870000
Marshall Islands 9.0000000
Martinique 14.6540000
Mauritania 17.9250000
Mauritius -20.2320000
Mayotte -12.8333333
Mexico 16.7590000
Micronesia, Fed. Sts. 7.3570000
Moldova 47.1660000
Monaco 43.7460000
Mongolia 47.4930000
Montenegro 42.7889900
Montserrat 16.7500000
Morocco 33.5930000
Mozambique -18.4990000
Myanmar 17.6790000
Namibia -17.9790000
Nauru -0.5333333
Nepal 27.7120000
Netherlands 51.8740000
Netherlands Antilles 12.1920000
New Caledonia -21.3290000
New Zealand -36.8920000
Nicaragua 12.2110000
Niger 13.8760000
Nigeria 6.5430000
Niue -19.0333333
Norfolk Island -29.0333333
Northern Mariana Islands 15.1780000
Norway 59.9770000
Oman 20.4450000
Pakistan 31.1730000
Palau 7.5000000
Panama 9.2060000
Papua New Guinea -6.6000000
Paraguay -25.5830000
Peru -11.7940000
Philippines 13.9220000
Pitcairn -25.0666667
Poland 50.2440000
Portugal 38.8160000
Puerto Rico 18.2250000
Qatar 25.3090000
Reunion -20.9540000
Romania 44.5260000
Russia 55.6750000
Rwanda -2.0320000
Saint Barthélemy 17.9000000
Saint Helena -15.9500000
Saint Kitts and Nevis 17.3270000
Saint Lucia 13.8980000
Saint Martin 18.0833333
Saint Vincent and the Grenadines 13.2540000
Saint-Pierre-et-Miquelon 46.8333333
Samoa -13.6330000
San Marino 43.7666667
Sao Tome and Principe 1.0000000
Saudi Arabia 23.0690000
Senegal 14.7720000
Serbia 44.0467300
Seychelles -4.6640000
Sierra Leone 8.7010000
Singapore 1.3550000
Slovak Republic 48.7853800
Slovenia 46.1279000
Solomon Islands -9.6250000
Somalia 10.6330000
South Africa -29.1300000
Spain 37.3980000
Sri Lanka 6.8680000
Sudan 14.0430000
Suriname 5.6050000
Svalbard 78.0000000
Swaziland -26.5450000
Sweden 59.2780000
Switzerland 47.4080000
Syria 33.4580000
Taiwan 23.6448500
Tajikistan 37.8060000
Tanzania -2.1540000
Thailand 13.7700000
Timor-Leste -8.8333333
Togo 6.1940000
Tokelau -9.0000000
Tonga -21.1730000
Trinidad and Tobago 10.4180000
Tunisia 36.8160000
Turkey 41.2020000
Turkmenistan 39.1293800
Turks and Caicos Islands 21.7500000
Tuvalu -8.0000000
Uganda 0.2280000
Ukraine 50.2810000
United Arab Emirates 23.3900000
United Kingdom 51.5100000
United States 34.3600000
Uruguay -34.8220000
USSR NA
Uzbekistan 41.2720000
Wallis et Futuna -13.3000000
Vanuatu -15.2330000
Venezuela 9.8430000
West Bank and Gaza 31.4166667
West Germany NA
Western Sahara 24.6191300
Vietnam 10.7980000
Virgin Islands (U.S.) 17.7360000
Yemen, Rep. 15.2280000
Yugoslavia NA
Zambia -12.9420000
Zimbabwe -17.8760000
Åland 60.0000000
# Load the readxl and gdata package
library(readxl)

# Specification of url: url_xls
url_xls <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls"

# Download file behind URL, name it local_latitude.xls
#download.file(url_xls, destfile = "local_latitude.xls")

# Import the local .xls file with readxl: excel_readxl
excel_readxl <- read_excel("local_latitude.xls")

excel_readxl %>%
  kable(caption = "Latitude from read_excel") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
  scroll_box(width = "100%", height = "300px")
Latitude from read_excel
country 1700
Afghanistan 34.5650000
Akrotiri and Dhekelia 34.6166667
Albania 41.3120000
Algeria 36.7200000
American Samoa -14.3070000
Andorra 42.5460000
Angola -8.8430000
Anguilla 18.2500000
Antigua and Barbuda 17.0720000
Argentina -36.6760000
Armenia 40.2540000
Aruba 12.5130000
Australia -32.2190000
Austria 48.2310000
Azerbaijan 40.3520000
Bahamas 24.7000000
Bahrain 26.0240000
Bangladesh 23.8800000
Barbados 13.1790000
Belarus 53.5470900
Belgium 50.8370000
Belize 17.8430000
Benin 6.3640000
Bermuda 32.2170000
Bhutan 27.4790000
Bolivia -15.1900000
Bosnia and Herzegovina 44.1750100
Botswana -21.5360000
Brazil -19.5570000
British Virgin Islands 18.5000000
Brunei 4.5010000
Bulgaria 42.0730000
Burkina Faso 12.0490000
Burundi -3.3650000
Cambodia 12.0260000
Cameroon 10.7300000
Canada 43.7270000
Cape Verde 15.0910000
Cayman Islands 19.3190000
Central African Rep. 4.3310000
Chad 10.3770000
Channel Islands 49.2170000
Chile -33.5540000
China 29.5610000
Christmas Island -10.5000000
Cocos Island -12.5000000
Colombia 4.7880000
Comoros -11.6710000
Congo, Dem. Rep. -2.9202760
Congo, Rep. -3.6840000
Cook Islands -21.2333333
Costa Rica 9.9410000
Cote d’Ivoire 5.4960000
Croatia 45.1100800
Cuba 23.0840000
Cyprus 35.0810000
Czech Rep. 49.7792900
Denmark 55.7180000
Djibouti 11.5050000
Dominica 15.4330000
Dominican Rep. 18.5610000
East Germany NA
Ecuador -2.0620000
Egypt 29.9960000
El Salvador 13.7750000
Equatorial Guinea 2.3260000
Eritrea 15.3120000
Estonia 58.6850000
Ethiopia 9.0070000
Faeroe Islands 62.0000000
Falkland Islands (Malvinas) -51.7500000
Fiji -17.8270000
Finland 60.2120000
France 48.8570000
French Guiana 3.9880000
French Polynesia -17.6660000
Gabon 0.3720000
Gambia 13.2570000
Georgia 42.0340000
Germany 48.1610000
Ghana 6.6940000
Gibraltar 36.1333333
Greece 38.0580000
Greenland 64.2170000
Grenada 12.1150000
Guadeloupe 16.1630000
Guam 13.4400000
Guatemala 14.6220000
Guernsey 49.4666667
Guinea 11.6710000
Guinea-Bissau 12.2620000
Guyana 5.7610000
Haiti 18.9320000
Holy See 41.9000000
Honduras 14.1940000
Hong Kong, China 22.7040000
Hungary 47.4190000
Iceland 63.8920000
India 25.2740000
Indonesia -6.5620000
Iran 35.3800000
Iraq 33.3140000
Ireland 54.6110000
Isle of Man 54.2250000
Israel 32.0840000
Italy 45.4150000
Jamaica 18.0550000
Japan 35.7120000
Jersey 49.2500000
Jordan 31.6020000
Kazakhstan 44.3080000
Kenya -0.5130000
Kiribati 1.8470000
Korea, Dem. Rep. 39.5270000
Korea, Rep. 37.5530000
Kosovo 42.5833333
Kuwait 29.3260000
Kyrgyzstan 41.4894000
Laos 16.5470000
Latvia 56.8580000
Lebanon 34.1110000
Lesotho -29.5950000
Liberia 6.3850000
Libya 32.6070000
Liechtenstein 47.1410000
Lithuania 55.3180000
Luxembourg 49.7800000
Macao, China 22.5060000
Macedonia, FYR 41.5738100
Madagascar -18.9580000
Malawi -15.8110000
Malaysia 3.2690000
Maldives 3.2500000
Mali 12.5080000
Malta 35.8870000
Marshall Islands 9.0000000
Martinique 14.6540000
Mauritania 17.9250000
Mauritius -20.2320000
Mayotte -12.8333333
Mexico 16.7590000
Micronesia, Fed. Sts. 7.3570000
Moldova 47.1660000
Monaco 43.7460000
Mongolia 47.4930000
Montenegro 42.7889900
Montserrat 16.7500000
Morocco 33.5930000
Mozambique -18.4990000
Myanmar 17.6790000
Namibia -17.9790000
Nauru -0.5333333
Nepal 27.7120000
Netherlands 51.8740000
Netherlands Antilles 12.1920000
New Caledonia -21.3290000
New Zealand -36.8920000
Nicaragua 12.2110000
Niger 13.8760000
Nigeria 6.5430000
Niue -19.0333333
Norfolk Island -29.0333333
Northern Mariana Islands 15.1780000
Norway 59.9770000
Oman 20.4450000
Pakistan 31.1730000
Palau 7.5000000
Panama 9.2060000
Papua New Guinea -6.6000000
Paraguay -25.5830000
Peru -11.7940000
Philippines 13.9220000
Pitcairn -25.0666667
Poland 50.2440000
Portugal 38.8160000
Puerto Rico 18.2250000
Qatar 25.3090000
Reunion -20.9540000
Romania 44.5260000
Russia 55.6750000
Rwanda -2.0320000
Saint Barthélemy 17.9000000
Saint Helena -15.9500000
Saint Kitts and Nevis 17.3270000
Saint Lucia 13.8980000
Saint Martin 18.0833333
Saint Vincent and the Grenadines 13.2540000
Saint-Pierre-et-Miquelon 46.8333333
Samoa -13.6330000
San Marino 43.7666667
Sao Tome and Principe 1.0000000
Saudi Arabia 23.0690000
Senegal 14.7720000
Serbia 44.0467300
Seychelles -4.6640000
Sierra Leone 8.7010000
Singapore 1.3550000
Slovak Republic 48.7853800
Slovenia 46.1279000
Solomon Islands -9.6250000
Somalia 10.6330000
South Africa -29.1300000
Spain 37.3980000
Sri Lanka 6.8680000
Sudan 14.0430000
Suriname 5.6050000
Svalbard 78.0000000
Swaziland -26.5450000
Sweden 59.2780000
Switzerland 47.4080000
Syria 33.4580000
Taiwan 23.6448500
Tajikistan 37.8060000
Tanzania -2.1540000
Thailand 13.7700000
Timor-Leste -8.8333333
Togo 6.1940000
Tokelau -9.0000000
Tonga -21.1730000
Trinidad and Tobago 10.4180000
Tunisia 36.8160000
Turkey 41.2020000
Turkmenistan 39.1293800
Turks and Caicos Islands 21.7500000
Tuvalu -8.0000000
Uganda 0.2280000
Ukraine 50.2810000
United Arab Emirates 23.3900000
United Kingdom 51.5100000
United States 34.3600000
Uruguay -34.8220000
USSR NA
Uzbekistan 41.2720000
Wallis et Futuna -13.3000000
Vanuatu -15.2330000
Venezuela 9.8430000
West Bank and Gaza 31.4166667
West Germany NA
Western Sahara 24.6191300
Vietnam 10.7980000
Virgin Islands (U.S.) 17.7360000
Yemen, Rep. 15.2280000
Yugoslavia NA
Zambia -12.9420000
Zimbabwe -17.8760000
Åland 60.0000000

It appears that readxl is not (yet?) able to deal with Excel files that are on the web. However, a simply workaround with download.file() fixes this.

Downloading any file, secure or not

In the previous exercise you’ve seen how you can read excel files on the web using the read_excel package by first downloading the file with the download.file() function.

There’s more: with download.file() you can download any kind of file from the web, using HTTP and HTTPS: images, executable files, but also .RData files. An RData file is very efficient format to store R data.

You can load data from an RData file using the load() function, but this function does not accept a URL string as an argument. In this exercise, you’ll first download the RData file securely, and then import the local data file.

# https URL to the wine RData file.
url_rdata <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/wine.RData"

# Download the wine file to your working directory
download.file(url_rdata, destfile = "wine_local.RData")

# Load the wine data into your workspace using load()
load("wine_local.RData")

# Print out the summary of the wine data
sum_wine <- data.frame(summary(wine))
x <- c("name", "Variable", "Statistics")
colnames(sum_wine) <- x
sum_wine[, 2:3] %>%
  kable(caption = "Summary statistics of Wine data frame") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
  scroll_box(width = "100%", height = "300px")
Summary statistics of Wine data frame
Variable Statistics
Alcohol Min. :11.03
Alcohol 1st Qu.:12.36
Alcohol Median :13.05
Alcohol Mean :12.99
Alcohol 3rd Qu.:13.67
Alcohol Max. :14.83
Malic acid Min. :0.74
Malic acid 1st Qu.:1.60
Malic acid Median :1.87
Malic acid Mean :2.34
Malic acid 3rd Qu.:3.10
Malic acid Max. :5.80
Ash Min. :1.360
Ash 1st Qu.:2.210
Ash Median :2.360
Ash Mean :2.366
Ash 3rd Qu.:2.560
Ash Max. :3.230
Alcalinity of ash Min. :10.60
Alcalinity of ash 1st Qu.:17.20
Alcalinity of ash Median :19.50
Alcalinity of ash Mean :19.52
Alcalinity of ash 3rd Qu.:21.50
Alcalinity of ash Max. :30.00
Magnesium Min. : 70.00
Magnesium 1st Qu.: 88.00
Magnesium Median : 98.00
Magnesium Mean : 99.59
Magnesium 3rd Qu.:107.00
Magnesium Max. :162.00
Total phenols Min. :0.980
Total phenols 1st Qu.:1.740
Total phenols Median :2.350
Total phenols Mean :2.292
Total phenols 3rd Qu.:2.800
Total phenols Max. :3.880
Flavanoids Min. :0.340
Flavanoids 1st Qu.:1.200
Flavanoids Median :2.130
Flavanoids Mean :2.023
Flavanoids 3rd Qu.:2.860
Flavanoids Max. :5.080
Nonflavanoid phenols Min. :0.1300
Nonflavanoid phenols 1st Qu.:0.2700
Nonflavanoid phenols Median :0.3400
Nonflavanoid phenols Mean :0.3623
Nonflavanoid phenols 3rd Qu.:0.4400
Nonflavanoid phenols Max. :0.6600
Proanthocyanins Min. :0.410
Proanthocyanins 1st Qu.:1.250
Proanthocyanins Median :1.550
Proanthocyanins Mean :1.587
Proanthocyanins 3rd Qu.:1.950
Proanthocyanins Max. :3.580
Color intensity Min. : 1.280
Color intensity 1st Qu.: 3.210
Color intensity Median : 4.680
Color intensity Mean : 5.055
Color intensity 3rd Qu.: 6.200
Color intensity Max. :13.000
Hue Min. :1.270
Hue 1st Qu.:1.930
Hue Median :2.780
Hue Mean :2.604
Hue 3rd Qu.:3.170
Hue Max. :4.000
Proline Min. : 278.0
Proline 1st Qu.: 500.0
Proline Median : 672.0
Proline Mean : 745.1
Proline 3rd Qu.: 985.0
Proline Max. :1680.0

Another way to load remote RData files is to use the url() function inside load(). However, this will not save the RData file to a local file.

Remember that you can’t directly use a URL string inside load() to load remote RData files. You should use url() or download the file first using download.file().

HTTP? httr! (1)

Downloading a file from the Internet means sending a GET request and receiving the file you asked for. Internally, all the previously discussed functions use a GET request to download files.

httr provides a convenient function, GET() to execute this GET request. The result is a response object, that provides easy access to the status code, content-type and, of course, the actual content.

You can extract the content from the request using the content() function. At the time of writing, there are three ways to retrieve this content: as a raw object, as a character vector, or an R object, such as a list. If you don’t tell content() how to retrieve the content through the as argument, it’ll try its best to figure out which type is most appropriate based on the content-type.

# Load the httr package
library(httr)

# Get the url, save response to resp
url <- "http://www.example.com/"
resp <- GET(url)

# Print resp
resp
## Response [http://www.example.com/]
##   Date: 2019-02-18 13:39
##   Status: 200
##   Content-Type: text/html; charset=UTF-8
##   Size: 1.27 kB
## <!doctype html>
## <html>
## <head>
##     <title>Example Domain</title>
## 
##     <meta charset="utf-8" />
##     <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
##     <meta name="viewport" content="width=device-width, initial-scale=1" />
##     <style type="text/css">
##     body {
## ...
# Get the raw content of resp: raw_content
raw_content <- content(resp, as = "raw")

# Print the head of raw_content
head(raw_content)
## [1] 3c 21 64 6f 63 74

The raw content of the response doesn’t make a lot of sense, does it? Luckily, the content() function by default, if you don’t specify the as argument, figures out what type of data you’re dealing with and parses it for you.

HTTP? httr! (2)

Web content does not limit itself to HTML pages and files stored on remote servers such as DataCamp’s Amazon S3 instances. There are many other data formats out there. A very common one is JSON. This format is very often used by so-called Web APIs, interfaces to web servers with which you as a client can communicate to get or store information in more complicated ways.

You’ll learn about Web APIs and JSON in the exercises that follow, but some experimentation never hurts, does it?

# Get the url
url <- "http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json"
resp <- GET(url)

# Print resp
print(resp)
## Response [http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json]
##   Date: 2019-02-18 13:39
##   Status: 200
##   Content-Type: application/json; charset=utf-8
##   Size: 929 B
# Print content of resp as text
print(content(resp, as = "text"))
## [1] "{\"Title\":\"Annie Hall\",\"Year\":\"1977\",\"Rated\":\"PG\",\"Released\":\"20 Apr 1977\",\"Runtime\":\"93 min\",\"Genre\":\"Comedy, Romance\",\"Director\":\"Woody Allen\",\"Writer\":\"Woody Allen, Marshall Brickman\",\"Actors\":\"Woody Allen, Diane Keaton, Tony Roberts, Carol Kane\",\"Plot\":\"Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall.\",\"Language\":\"English, German\",\"Country\":\"USA\",\"Awards\":\"Won 4 Oscars. Another 26 wins & 8 nominations.\",\"Poster\":\"https://m.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg\",\"Ratings\":[{\"Source\":\"Internet Movie Database\",\"Value\":\"8.0/10\"},{\"Source\":\"Rotten Tomatoes\",\"Value\":\"97%\"},{\"Source\":\"Metacritic\",\"Value\":\"92/100\"}],\"Metascore\":\"92\",\"imdbRating\":\"8.0\",\"imdbVotes\":\"233,030\",\"imdbID\":\"tt0075686\",\"Type\":\"movie\",\"DVD\":\"28 Apr 1998\",\"BoxOffice\":\"N/A\",\"Production\":\"United Artists\",\"Website\":\"N/A\",\"Response\":\"True\"}"
# Print content of resp
print(content(resp))
## $Title
## [1] "Annie Hall"
## 
## $Year
## [1] "1977"
## 
## $Rated
## [1] "PG"
## 
## $Released
## [1] "20 Apr 1977"
## 
## $Runtime
## [1] "93 min"
## 
## $Genre
## [1] "Comedy, Romance"
## 
## $Director
## [1] "Woody Allen"
## 
## $Writer
## [1] "Woody Allen, Marshall Brickman"
## 
## $Actors
## [1] "Woody Allen, Diane Keaton, Tony Roberts, Carol Kane"
## 
## $Plot
## [1] "Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall."
## 
## $Language
## [1] "English, German"
## 
## $Country
## [1] "USA"
## 
## $Awards
## [1] "Won 4 Oscars. Another 26 wins & 8 nominations."
## 
## $Poster
## [1] "https://m.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg"
## 
## $Ratings
## $Ratings[[1]]
## $Ratings[[1]]$Source
## [1] "Internet Movie Database"
## 
## $Ratings[[1]]$Value
## [1] "8.0/10"
## 
## 
## $Ratings[[2]]
## $Ratings[[2]]$Source
## [1] "Rotten Tomatoes"
## 
## $Ratings[[2]]$Value
## [1] "97%"
## 
## 
## $Ratings[[3]]
## $Ratings[[3]]$Source
## [1] "Metacritic"
## 
## $Ratings[[3]]$Value
## [1] "92/100"
## 
## 
## 
## $Metascore
## [1] "92"
## 
## $imdbRating
## [1] "8.0"
## 
## $imdbVotes
## [1] "233,030"
## 
## $imdbID
## [1] "tt0075686"
## 
## $Type
## [1] "movie"
## 
## $DVD
## [1] "28 Apr 1998"
## 
## $BoxOffice
## [1] "N/A"
## 
## $Production
## [1] "United Artists"
## 
## $Website
## [1] "N/A"
## 
## $Response
## [1] "True"

The fact that httr converts the JSON response body automatically to an R list is very convenient.